Skip to main content

Susanne Still: Interactive learning

Movies Preview

movies
Susanne Still: Interactive learning


Published January 10, 2010


Talk by Susanne Still of the University of Hawaii. Given on January 6, 2010 for the Redwood Center for Theoretical Neuroscience at UC Berkeley.

Abstract.
I present a quantitative approach to interactive learning and adaptive behavior, which integrates model- and decision-making into one theoretical framework. This approach follows simple principles by requiring that the observers behavior and the observers internal representation of the world should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models can be derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process causal organization in the presence of the feedback due to the learners actions. A fundamental consequence of the proposed principle is that the optimal action policies have the emerging property that they balance exploration and control. Interestingly, the explorative component is present also in the absence of policy randomness, i.e. in the optimal deterministic behavior. Exploration is therefore not the same as policy randomization. This is a direct result of requiring maximal predictive power in the presence of feedback. It stands in contrast to, for example, Boltzmann exploration, which is frequently used in Reinforcement Learning (RL). Time permitting, I will discuss what happens when one includes explicit goals and rewards into the theory, as is popular in RL.


Producer Redwood Center for Theoretical Neuroscience
Audio/Visual sound, color

comment
Reviews

Reviewer: JuanTheorNeuro - favoritefavoritefavoritefavoritefavorite - June 30, 2010
Subject: a very interactive lecture
The lecture is very good and very interactive too (!). Prof Still proposes the denomination "Interactive learning" to differentiate it from active learning. The difference is that there is real feedback from the learner and it has the capacity to change the probability distributions that underlies the mechanism of data generation of the world (in active learning this probability does not change). The motivation is to create a model of the world that generates good (optimal) predictions with minimal information. The action policy is not done to maximize reward, food or energy, but prediction. This is a learning which goal is to make the world predictable with the simplest policy (i.e. the reward is the good prediction) but without long-term action planning. The model predictive ability is measured by the mutual information that the internal state, in the presence of the action, contains about the future. The goal can be expressed as:
max (I[{s,a};z] - lambda*I[s; h] - mu*I[a;h])
s: states, a: the actions, z: the future observations and h: the history. The lambda and mu terms express the need to reduce model complexity and polices respectively in the goal.
Juan F Gomez-Molina, Intl Group of Neuroscience
SIMILAR ITEMS (based on metadata)
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 3,381
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 377
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 291
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 309
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 1,043
favorite 0
comment 0
Community Video
movies
eye 745
favorite 0
comment 0
Community Video
movies
eye 306
favorite 1
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 216
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 122
favorite 0
comment 0
Community Video
by Redwood Center for Theoretical Neuroscience
movies
eye 713
favorite 0
comment 0